Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a high-level API for creating repodata #271

Merged
merged 3 commits into from
Mar 11, 2024

Conversation

dralley
Copy link
Contributor

@dralley dralley commented May 28, 2021

This is a mostly-functional prototype of a more convenient (Python) API for writing repos that I threw together.

There's no obligation to merge this, but, is it something you would be interested in merging? I figure that 90% of this code is code that any consumer of the API would need to write / copy anyway (and I started with just the example code).

If the answer is yes, I can finish it off, write documentation and tests, etc. Otherwise we can just close.

@dralley
Copy link
Contributor Author

dralley commented May 28, 2021

Not sure why that test would be failing.

edit: Interesting, it passes on Fedora 33, and fails on Fedora 34

Did something change with libxml? Filed #272

@kontura
Copy link
Contributor

kontura commented Jun 2, 2021

yea the failing tests are unrelated, in addition to the libxml2 issue you described there is also a zchunk regression.

Regarding the PR it self, I personally would be fine with merging this. Some kind of a higher-level API that is in spirit closer to the createrepo_c binary seems nice to me.

@dralley dralley force-pushed the new-api branch 2 times, most recently from ef94963 to ccb8a6d Compare June 2, 2021 18:02
@Conan-Kudo
Copy link
Member

I really like the idea of a simpler high level API too.

@dralley
Copy link
Contributor Author

dralley commented Jun 8, 2021

Great, I'll keep working on it then.

@dralley dralley changed the title Add experimental new high-level API prototype for creating repodata Add a high-level API prototype for creating repodata Aug 7, 2021
@dralley dralley changed the title Add a high-level API prototype for creating repodata Add a high-level API for creating repodata Aug 7, 2021
@dralley dralley force-pushed the new-api branch 8 times, most recently from 34ec1be to e70835b Compare August 7, 2021 14:54
@dralley
Copy link
Contributor Author

dralley commented Aug 7, 2021

I have some API design questions, see the TODO notes

@dralley dralley force-pushed the new-api branch 4 times, most recently from 1b56c81 to babcaf3 Compare August 8, 2021 22:25
src/python/createrepo_c/__init__.py Outdated Show resolved Hide resolved
src/python/createrepo_c/__init__.py Outdated Show resolved Hide resolved
src/python/createrepo_c/__init__.py Outdated Show resolved Hide resolved
if preserve_existing_metadata:
x = 0
while True:
new_repodata_path = f"{self.repodata_path}_{x}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think createrepo_c binary and RepositoryWriter should be as consistent as possible. How about using something closer to --retain-old-md and --keep-all-metadata instead of preserve_existing_metadata?

Or do you prefer this behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just what the example had been doing when I copied it over. I think there's a good argument to be made either way. It does seem a little silly to keep the metadata files around but throw away the repomd.xml that tied them together. If you keep the old repodata directories, it's fairly straightforwards to turn it back into a functional repo (assuming the packages are still there)

sqlite_metadata_info.writer.close()
else:
record = RepomdRecord(record_name, str(metadata_info.path))
record = record.compress_and_fill(self._metadata_checksum_type, BZ2_COMPRESSION)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While trying out the RepositoryWriter I noticed the created repositories are invalid. I think this section specifically exposes a bug in the library.

record.compress_and_fill(self._metadata_checksum_type, BZ2_COMPRESSION) returns a new record with filled in checksums while the old one (created on the line above - 603) is freed. However these two lines: https://github.com/rpm-software-management/createrepo_c/blob/master/src/repomd.c#L557-L558 make the new record use chunk from the old (freed) one. This leads to garbage values.
Could this be fixed as a part of this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kontura It might take a while before this PR is ready depending on how much time I can dedicate to it. It should probably go in a separate PR. I can maybe take a shot at fixing it next week though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that is no problem.

I think we just need to change record->chunk to crecord->chunk for both of those lines. 🙂

@dralley
Copy link
Contributor Author

dralley commented Oct 31, 2023

@kontura If I go forwards with this, are you ok with commits 2 and 3?

@kontura
Copy link
Contributor

kontura commented Oct 31, 2023

Yes and maybe we could even merge metadata_compression with general_compression?

@dralley
Copy link
Contributor Author

dralley commented Jan 8, 2024

This is ready to be looked at. I don't claim it's perfect but I tried to get decent test coverage.

@dralley
Copy link
Contributor Author

dralley commented Jan 12, 2024

@kontura ^

@dralley
Copy link
Contributor Author

dralley commented Jan 24, 2024

Bump!

@kontura
Copy link
Contributor

kontura commented Jan 24, 2024

@dralley sorry I am a bit swamped now, it might take a while.

@jan-kolarik jan-kolarik self-requested a review February 1, 2024 14:37
@kontura kontura self-assigned this Mar 7, 2024
Copy link
Contributor

@kontura kontura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a couple of issues in one of the examples.

examples/python/simple_repository_writing.py Outdated Show resolved Hide resolved
examples/python/simple_repository_writing.py Outdated Show resolved Hide resolved
Abstracts away a lot of manual logic around setup and state tracking.
The existing API is frustratingly verbose and low-level. This will make
it trivial to create repositories with only a few lines.
@dralley
Copy link
Contributor Author

dralley commented Mar 11, 2024

Updated

@kontura
Copy link
Contributor

kontura commented Mar 11, 2024

Thank you!

@kontura kontura merged commit a4e4c45 into rpm-software-management:master Mar 11, 2024
7 checks passed
@dralley dralley deleted the new-api branch March 11, 2024 21:31
self.working_metadata_files["filelists"].writer.add_pkg(pkg)
self.working_metadata_files["other"].writer.add_pkg(pkg)

def add_repomd_metadata(self, name, path, use_compression=True):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kontura I didn't notice this until now (sorry), but add_repomd_metadata is a bit redundant. Should we fix that name? Or is it fine because we're adding metadata to "repomd"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to change it to just add_metadata?
I think I would like that a little bit more but I am also fine with the current form.

If you want to change it I believe we still can but the sooner the better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants